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Genome-wide analysis techniques such as chromosome 
painting, comparative genomic hybridization, representa- 
tional difference analysis, restriction landmark genome 
scanning and high-throughput analysis of LOH are now 
accelerating high-resolution genome aberration localization 
in human tumors. These techniques are complemented by 
procedures for detection of differentially expressed genes 
such as differential display, nucleic acid subtraction, serial 
analysis of gene expression and expression microarray 
analysis. These efforts are enabled by work from the human 
genome program in physical map development, cDNA 
library production/sequencing and in genome sequencing. 
This review covers several commonly used large-scale 
genome and gene expression analysis techniques, outlines 
genomic approaches to gene discovery and summarizes 
information that has come from large-scale analyses of 
human solid tumors. 



Introduction 

Tumors progress through the continuous accumulation of 
genetic and epigenetic changes that enable escape from normal 
cellular and environmental controls. These aberrations may 
involve genes that affect cell cycle control, apoptosis, angiogen- 
esis, adhesion, transmembrane signaling, DNA repair and 
genomic stability. A substantial number of such oncogenes 
and tumor suppressor genes has already been discovered. 
However, large-scale genome analysis techniques suggest that 
the number of such genes may be large, perhaps strikingly 
so, and many important cancer-related genes remain to be 
discovered. Indeed, it is common to find as much as 30% of 
a solid tumor genome to be present at abnormal copy number, 
or otherwise aberrant ( I ), and the number of genomic mutations 
may be as large as 10-^-10^ per tumor (2-4). Of course, many 
of these are likely to reflect 'noise' accumulated as genomically 
unstable tumors progress. The challenge is to distinguish the 
important genomic aberrations from the genomic noise, identify 
the affected genes, and clarify their roles in tumorigenesis, 
progression and/or response to therapy. 

Definition of regions of recurrent genomic aberration is one 
historically important route to the identification of genes that 
play a role in cancer, A number of such regions have been 
identified in human cancers but the functional consequences 
of most of these abnormalities are not yet known. Identification 

Abbreviations: CGH, comparative genomic hybridization; FISH, fluorescence 
in situ hybridization; LOH, loss of heterozygosity; RDA, representational 
difference analysis; RLGS, restriction landmark genome scanning; SAGE, 
serial analysis of gene expression. 



of the affected genes in these regions, elucidation of their 
functions and association of these genes with tumor progression 
are required to fully understand tumorigenesis and progression. 
Regions of recurrent abnormality are typically defined through 
comprehensive analysis of many tumors with the goal of 
finding a few tumors carrying informative aberrations that can 
be used to narrowly define the extent of the region. Large-scale 
genomic analysis, such as chromosome painting, comparative 
genomic hybridization (CGH), representational difference ana- 
lysis (RDA), restriction landmark genome scanning (RLGS) 
and high-throughput analysis of loss of heterozygosity (LOH) 
are now accelerating genome aberration localization. 

Identification of genes that are expressed differently in 
normal tissues and the cancers that originate in these tissues 
is another important approach to identification of cancer- 
related genes. Differential display, nucleic acid subtraction and 
serial analysis of gene expression (SAGE) have been applied 
effectively to discover such genes. More recently, expression 
microarray analysis techniques have been developed that 
promise to allow quantitative, very large-scale analysis of gene 
expression. 

These efforts have been enabled by work from the human 
genome program in physical map development, cDNA library 
production/sequencing and in genome sequencing. For 
example, cDNA and genomic clones serve as targets for 
microarray analysis technologies; interpretation of CGH, RDA 
and RLGS data depends on physical map information, and 
gene discovery has been speeded considerably by genome 
sequencing and cDNA library characterization. 

This review covers several commonly used large-scale 
genome and gene expression analysis techniques, outiines 
genomic approaches to gene discovery and summarizes 
information that has come from large-scale analyses of human 
solid tumors. 

Genome analysis techniques 

Remarkable progress has been made in recent years in the 
development of technologies for definition of regions likely to 
harbor genes important in tumorigenesis or progression. These 
include: (i) metaphase chromosome analysis; (ii) genome copy 
number mapping; (iii) high-throughput polymorphism analysis; 
and (iv) multiplex PGR analysis. 

Metaphase chromosome analysis 

Analysis of metaphase chromosomes has been a cornerstone 
of cancer genetics since the introduction of modem banding 
analysis techniques. Banding analysis has been especially 
useful in identifying causative chromosome rearrangements in 
leukemias and lymphomas (5). However, this approach has 
been less successful in solid tumors because of the difficulty of 
obtaining high quality, representative metaphase chromosome 
preparations and because the high level of chromosomal 
rearrangement complicates karytoype interpretation. Fluores- 
cence in situ hybridization (FISH) with chromosome specific 
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Fig. 1. Fluorescence ratio hybridization is a key component of comparative 
genomic hybridization and expression microarray analysis. In these 
processes, two nucleic acid samples to be compared are differentially 
labeled with reagents that fluoresce at different wavelengths. They are then 
hybridized along with excess, unlabeled repeat rich DNA, to the 
representation of the genome onto which information is to be mapped. In 
CGH, the representation may be either metaphase chromosomes or arrays of 
cloned probes. In expression microarray analysis, the representation may be 
arrays of cDNA clones or oligonucleotides. 



probes (6,7) has significantly improved chromosome classi- 
fication by increasing the specificity with which chromosomes 
or subregions thereof can be recognized. Staining patterns that 
can be achieved range from whole chromosome staining to 
precise staining of specific loci. Recently, combinatorial 
labeling strategies (e.g. probe 1 stained red, probe 2 stained 
green, probe 3 stained red + green, etc.) have been introduced 
that allow dozens of spatially separate loci or chromosomes 
to be simultaneously visualized (8-11). These techniques have 
dramatically increased the accuracy and sensitivity with which 
chromosome aberrations can be detected and classified, and 
clearly demonstrate the limitations of conventional banding 
analysis in analysis of human malignancies (12). Several 
recent reviews cover developments in FISH-based chromosome 
analysis (9,13-17). 

Genome copy number mapping 

Although FISH has substantially improved metaphase chromo- 
some classification, its application in solid tumors is still 
limited by the difficulty of interpreting the complex karyotypes. 
CGH, developed in 1992, partially overcomes this by mapping 
changes in relative DNA sequence copy number onto normal 
metaphase chromosomes (18). In CGH, total genome DNAs 
from tumor and reference samples are labeled independently 
with different fluorochromes or haptens and co-hybridized to 
normal chromosome preparations along with excess unlabeled 
Cot-1 DNA to inhibit hybridization of labeled repeated 
sequences. The concept of CGH is illustrated schematically in 
Figure 1 and the results of a CGH hybridization are shown in 
Figure 2. The ratio of the amounts of the two genomes that 
hybridize to each location on the target chromosomes is an 
indication of the relative copy number of the two DNA samples 
at that point in the genome. Figure 3 illustrates the application 
of CGH to analysis of an advanced breast cancer. The remark- 
able level of genomic abnormality is apparent. The principle 
advantages of CGH are that it maps changes in copy number 
throughout a complex genome onto a normal reference genome 
so the aberrations can be easily related to existing physical 
maps, genes and genomic DNA sequence, and it employs 
genomic DNA so that cell culture is not required. Technical 
aspects of CGH are covered in several reviews (19-21). 

The niain limitations of chromosome-based CGH are that 
it is limited in resolution to 10-20 Mb, it does not provide 
quantitative information about gene dosage and it is insensitive 
to structural aberrations that do not result in a DNA sequence 
copy number change. Replacing metaphase chromosomes as 
the substrate onto which aberrations are mapped with arrays 
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of well-mapped cloned nucleic acid sequences can eliminate 
some of these limitations, as illustrated in Figure 1 . The arrays 
are constructed using a robot to place clone DNA in high- 
density arrays on glass substrates. Array densities as high as 
\OVcm can now be achieved. This approach has now been 
demonstrated in several laboratories (22-24). Initial work 
involved CGH to arrays comprised of targets spanning >100 
kb of genomic sequence such as BACs (22,23). Most recently, 
CGH to cDNA arrays has been demonstrated (24). cDNA 
arrays are attractive for CGH since they are increasingly 
available and carry a very large number of clones. In addition, 
the same array can be used to assess expression and copy 
number. However, the sensitivity of cDNA clone-based CGH 
for detection of low-level, copy number changes is likely to 
be less than that for BAC -based CGH due to the decreased 
hybridization signal to the smaller clones. At present, both 
approaches appear to be useful and clearly demonstrate that 
changes in genome copy number can be detected and mapped 
at a resolution defined by the genomic spacing of the clones 
used to form the array. Furthermore, array CGH allows 
quantitative assessment of DNA sequence dosage from one 
copy per test genome to hundreds of copies per genome (23). 
Figure 4 from Pinkel et al. (23) shows an analysis of the 
amplicon structure on Chr 20q for a human breast cancer. 
Both the increased resolution of array CGH compared with 
chromosome CGH and the opportunities for quantitative aber- 
ration definition are apparent in this analysis. 

Quantitative analysis of genome copy number also can be 
accomplished using real-time, quantitative PCR (25-27). In 
this procedure, PCR is carried out in a 96-well format using 
a PCR reaction containing TaqMan' reporter oligonucleotides 
(carrying a reporter fluorescence molecule and a quencher 
molecule) as indicators of DNA sequence copy number. 
Exonuclease activity during PCR digests the Taqman probes 
resulting in liberation of the fluorescent reporter from the 
quencher. As a result, reporter fluorescence can be measured 
to detect the extent of the PCR reaction. This technique may 
be an alternative to analysis of LOH in cases where LOH is 
the result of a physical deletion, since polymorphism of the 
alleles being measured is not necessary. The copy number at 
each locus is measured relative to others in the same 96-well 
plate so that all sites measured are informative. The major 
drawbacks of this procedure are the labor and cost associated 
with synthesis of the TaqMan probes and the relatively small 
number (hundreds) of loci that can be conveniently analyzed. 

RLGS is another genome scanning technique that allows 
identification of genome copy number differences (28,29). In 
addition, it enables identification of mutations and polymorph- 
isms. In RLGS, a test genome is digested with a rare cutting 
enzyme like Noil and the resulting fragments are radioactively 
labeled. The DNA is then digested with a second restriction 
enzyme and electrophoretically separated on an agarose gel. 
Finally, the DNA fragments in the gel are digested with a 
third enzyme and electrophoretically separated in a second 
dimension by placing a strip from the agarose gel along the 
top of an acrylamide gel. The radioactively labeled landmark 
DNA fragments are then detected using autoradiography or 
by phosphoimaging. Up to 2000 separate loci can be analyzed 
in a single experiment. Applications of RLGS include high- 
speed construction of linkage maps (30,31), quantitative ana- 
lysis of copy number at each landmark locus (32) and detection 
of mutations involving the restriction sites used to prepare the 
two-dimensional (2D) electropherograms. Landmark fragments 
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Fig. 2, Ruorescencc pholomicrograph showing the results of a CGH analysis of the human breast cancer cell line. MCF7. MCF7 DNA was labeled green and 
normal reference DNA was labeled red. The chromosomes were counterstained with DAPI. Thus, regions of weak hybridization appear blue, regions of 
increased copy number appear green and regions of decreased copy number appear red. 
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Fig. 3. A CGH analysis of an advanced breast cancer. The data are arranged 
along the jr-axis with chromosome 20pter to the left and chromosome 22qter 
to the right. The green:red CGH ratio is plotted along the >'-axis. The gray 
band indicates the region of normal variability. Thus« values above the band 
show significant increases in copy number and values below the band show 
significant decreases in copy number. 

of interest may be cloned by excising the DNA from that part 
of the 2D gel (33). Alternately, cloned NotVEcoRW boundary 
library elements can be mixed with the genomic DNA during 
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Fig. 4. An array CGH analysis of genome copy number changes along 
chromosome 20 in a human breast cancer (adapted from ref. 23). The 
resolution and dynamic range is much improved compared with 
chromosome CGH. 



RLGS SO that landmark loci present in the library appear to 
be increased in intensity after RLGS (34). The corresponding 
clones can then be obtained from the library for further analysis. 
RLGS can be made sensitive to methylation differences by 



445 



J.W.Gray and C.Collins 



employing methylation-sensitive restriction enzymes (29,34). 
The principal limitations of RLGS are that many landmark 
loci are not associated with existing physical maps and it does 
not permit high-resolution analysis of interesting regions. 

Arbitrarily primed PGR (AP-PCR) also has been used to 
detect somatic genetic alterations in human tumors (35). In this 
approach, DNA fingerprints generated by PGR amplification of 
normal and tumor tissue using single arbitrary primers are 
separated electrophoretically and compared. Differences in 
PGR fragment band intensity (visible as AP-PGR bands with 
decreased or increased intensities in tumor tissue DNA relative 
to normal) indicate regions of the tumor genome that are 
present in altered copy number. Sequences located at these 
sites can be cloned after reamplification with the same arbitrary 
primer. This approach also has been applied to detection of 
gene expression differences between tumor and normal tissue 

(36) (see below). 

High'througfiput analysis of polymorphisms 
The adaptation of automated DNA sequencers for the analysis 
of DNA fragment length polymorphisms has facilitated linkage 
studies and analysis of LOH by allowing rapid, inexpensive 
measurements of a large number of loci in multiple samples 

(37) . Automated analysis is accomplished by PGR amplifying 
loci containing length polymorphisms such as those produced 
by simple sequence repeat length variations. To allow auto- 
mated analysis, one member of each primer pair is synthesized 
to contain a fluorescent label (38). Amplification results in 
allele-specific products that can be distinguished during electro- 
phoresis. Use of separable fluorescent labels allows analysis 
of several different loci in each lane in the gel. Gareful 
selection of PGR primers to amplify regions of different size 
allows simultaneous analysis of even more loci in each lane. 

Single nucleotide polymorphisms (SNPs) also can be 
detected efficiently by hybridization of fluorescently labeled, 
PGR-amplified representations of the genome to arrays com- 
prised of oligonucleotides (39,40). Both alleles of each of 
several thousand SNP markers and single base mismatch 
targets may be represented on an array. The stringency of the 
hybridization reaction is adjusted so that hybridization is 
diminished if a single base mismatch exists between the probe 
and oligonucleotide substrate. Thus, the presence or absence 
of an allele in the hybridization mixture can be determined by 
its hybridization signature. This technique is highly parallel 
and scales well to genome wide assessments of linkage or 
LOH. However, its robustness for analysis of formalin-fixed, 
paraffin-embedded samples from pathology archives remains 
to be assessed. 

Representational difference analysis (RDA) 
Techniques like array CGH or analysis of LOH are limited at 
present since they only detect aberrations at the test loci. 
Nucleic acid subtraction strategies overcome this limitation 
(41-43). These techniques allow detection and cloning of 
segments of DNA that differ in copy number between two 
complex genomes. In this approach, the two genomes to be 
compared are enzymatically restricted and ligated to linker 
oligonucleotides. This process creates 'representations' of the 
two genomes to be compared that can be amplified using 
primers for the adapter oligonucleotides. Differences between 
two genomes are detected by denaturing representations from 
both genomes and hybridizing the 'tester' genome against 
excess amounts of the 'driver' genome. The representation 
elements common to each will form driver-tester heterodu- 



plexes, whereas elements that are present only in 'tester* DNA 
form tester-tester homoduplexes. Special linkers attached to 
the representations before hybridization allow PGR amplifica- 
tion of homoduplexes but suppress the amplification of hetero- 
duplexes. Thus, the products of the amplification are strongly 
enriched in sequences present only in the tester DNA. These 
strategies reveal differences that result from deletion, ampli- 
fication or mutation of the test genome. However, since they 
survey representations of the genome, several representations 
must be tested for comprehensive difference discovery. 

' Genome sequence analysis 

Once regions of abnormality or susceptibility have been defined 
to within ~1 Mb, it becomes feasible to assess genes and 
associated regulatory sequences in these regions. This process 
typically begins with analysis of candidate genes already 
mapped to these regions. This is increasingly productive as 
the number of mapped ESTs in public databases increases. 
The work of the GGAP (http://www.ncbi.nlm.nih.gov/GGAP/), 
TIGR (http://www.tigr.org/) and the IMAGE programs (http:// 
www-bio.llnl.gov/bbrp/image/image.html) are especially 
important in this regard. However, there is risk with the 
candidate gene approach of missing crucial genes. Moreover, 
the candidate gene approach reveals little about the regulatory 
elements of the candidate genes. Thus, comprehensive tech* 
niques for gene discovery and characterization are needed. 
Large-scale genomic sequencing is now sufficiently advanced 
that it is reasonable to consider genomic sequencing as a 
practical tool for gene discovery. The worldwide effort is 
scheduled to complete a 'draft' sequence of the human genome 
by mid-2000 and to have the complete 'Bermuda quality' 
sequence by 2003 (44). This will dramatically speed discovery 
of candidate genes in regions of recurrent abnormality or 
linkage to disease susceptibility. 

Human, human sequence by itself is not adequate for 
comprehensive biological annotation since computational tools 
are not yet sufficiently robust to accurately identify exons, 
predict their boundaries and to group them into functional 
genes. In addition, regulatory regions may be missed and 
the false positive rate foi* transcription factor binding site 
predictions remains too high. Comparative genomic sequencing 
of syntenic regions in other organisms promises to substantially 
enhance assessment of gene function and discovery of coding 
and regulatory sequences (45). The power of this approach has 
already been demonstrated in comparisons of Caenorhabditis 
elegans and Caenorhabditis briggsae genomic sequences 
(46,47) and through comparisons of human genomic sequences 
with those from model organisms such as C. elegans (48,49), 
Drosophila (50), Saccharomyces cerevisiae (51) and rodents 
(52). Comparison between human and mouse appears especi- 
ally useful for genomic sequence interpretation because of the 
increasing importance of mouse models in the elucidation of 
human cancer gene function. The finding of conserved regu- 
latory sequences in introns in mid-gene (53,54) is intriguing 
since it raises the possibility that comparative genomic sequen- 
cing may reveal previously unsuspected regulatory regions. 

Gene expression 

Remarkable progress has been made in recent years in the 
development of techniques to identify differences in gene 
expression between cell populations. At least three approaches 
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are now in widespread use: (i) differential display; (ii) nucleic 
acid subtraction; (iii) analysis using expression microarrays. 

Differential display 

This well-established technique is used to identify and isolate 
genes that are differentially expressed between two cell popula- 
tions (55). In this approach, mRNA sequences from cell 
populations to be compared are reverse transcribed and ampli- 
fied by PCR using a set of oligonucleotide primers, one 
anchored to the poly(A) tail and the other to a short arbitrary 
oligonucleotide that binds at varying distances from the poly (A) 
tail for the various RNA molecules. For some RNA molecules, 
the separation between the two primer sequences is too large 
to allow PCR amplification so that only a subset of RNA 
molecules are amplified. Separation of the amplified sequences 
on a DNA sequencing gel allows visualization of each of 
the amplified sequences. Comparison of gels for two cell 
populations reveals sequences that are abundant in one but not 
the other. Use of several different primer sets allows analysis 
of a larger number of genes. Sequences of interest may be 
excised from the gel and cloned. The advantages of differential 
display include its ease of use and its power to discover 
previously unknown differences. Its principal disadvantages 
are that not all differences are discovered using a single 
arbitrary primer, recovery of interesting DNA fragments is 
somewhat time consuming and differences in levels of expres- 
sion are difficult to quantify. Nonetheless, this technique has 
been widely and successfully applied to analysis of human 
malignancies. 

Nucleic acid subtraction 

Techniques to clone differences between two mRNA popula- 
tions are well developed. The principles are similar to those 
for RDA. The process begins with reverse transcription of the 
mRNA from two populations to form cDNA. In one approach, 
the 'driver' cDNA is labeled to allow affinity separation of 
the labeled driver sequences. The driver cDNA is then hybrid- 
ized in excess to 'tester' cDNA from the other population 
and the driver-driver and tester-driver hybrid molecules are 
removed by affinity separation (56). Alternately, the driver 
cDNA and hybrid molecules are enzymatically removed by 
digestion with exonucleases rather than by physical parti- 
tioning (57). 

Serial analysis of gene expression (SAGE) 

The relative frequency of gene expression can also be deter- 
mined by sequencing a large number of cDNA fragments in 
a library prepared from the cells or tissue of interest (58,59). 
This is accomplished by ligating together short -10 bp long 
sequence 'tags' from the 3 '-most Nlalll restriction sites of 
multiple genes. The tags are separated by distinctive linker 
sequences so the various sequences can be distinguished. The 
ligated sequences from many different concatimers are then 
sequenced and the results compiled to form a distribution 
showing the frequencies of the various gene-associated tags. 
This process is sufficiently efficient that 10^-10^ tags can be 
sequenced from each library. The main advantage of SAGE is 
its unbiased assessment of the frequencies with which genes 
are expressed. Disadvantages include the lack of clones from 
novel tags that may appear during sequencing and the need 
for extensive sequencing to accurately assess levels of expres- 
sion of weakly expressed genes. NCI resources for SAGE 
analysis as well as a more detailed description can be found 
at http://www.ncbi.nlm.nih.gov/SAGE/. 



Expression microarrays 

Enormous progress has been made in recent years in the 
development and DNA sequence characterization of cDNA 
clones from the human, mouse and other model organisms. In 
humans, these data have been computationally assembled into 
over 8000 genes and 83 000 clusters. The cDNA clones 
associated with these sequences are publicly available. These 
clones and their associated sequences form the basis for a 
powerful microarray approach to large-scale analysis of gene 
expression (60-64). In this approach, labeled mRNA samples 
are hybridized to arrays of cDNA clones or oligonucleotides 
derived from the associated sequences. The arrays may be on 
silicon or membrane substrates. The labeled probes may be 
labeled radioactively or with fluorescent reagents so that 
the resulting hybridization signals can be detected using 
autoradiography, phosphorimaging or fluorescence imaging. 
cDNA and oligonucleotides arrays have been made using 
robots to move DNA from microtiter trays to silicon substrates 
or to nylon membranes (62,63,65). This approach is flexible 
and is especially well suited to production of custom arrays 
but also been applied to make large-scale arrays carrying as 
many as 40 000 different clones. An alternative is to synthesize 
oligonucleotide arrays directly on silicon substrates using 
photolithographic approaches (61,66). These techniques work 
by projecting light through a photolithographic mask onto the 
synthesis substrate. The light *deprotects' the surface so a 
nucleoside carrying a photolabile protecting group can be 
added. The synthesis proceeds by deprotecting all the areas that 
are to receive a common nucleoside, coupling that nucleoside, 
deprotecting areas to receive another nucleoside and so on. 
Thus, four cycles and four photolithographic masks are required 
to add one base to the entire array. The photolithographic 
approaches allow production of large numbers of very high- 
density arrays. However, the initial setup costs are high so this 
approach is best suited to production of large numbers of 
'standard' arrays. One alternative is to use a scanned light 
spot to deprotect each *feature' independently (67). However, 
this approach has not yet been fully developed. Single oligonu- 
cleotide arrays on silicon substrates have been constructed 
with elements representing more than 40 000 genes/ESTs 
while densities on membranes are somewhat lower (60). 
Several review articles on microarray technologies and their 
applications can be found in 'The Chipping Forecast' (68). 

Large-scale analyses of solid tumors 

Genome scanning techniques such as chromosome painting, 
spectral karyotyping, analysis of LOH and CGH cleariy 
demonstrate a remarkably high degree of chromosome 
rearrangement in human solid tumors (Figures 3 and 4). 
Analyses of LOH and CGH and interphase FISH are especially 
useful since they can be applied to uncultured primary tumors 
and thus give a relatively unbiased view of the spectrum of 
abnormalities. In addition, they map abnormalities onto existing 
physical maps of the genome. A comprehensive review of the 
literature is impossible in this publication because of the 
remarkably large number of tumor analyses that have been 
published in recent years thanks to these more efficient analysis 
technologies. However, several general features of solid tumors 
are becoming apparent. Several of these are reviewed below. 
The associated references are necessarily anecdotal and only 
intended as an entry point to the literature. 
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Recurrent genome aberrations 

Cytogenetic and genonfie analysis techniques have revealed 
numerous regions that are frequently abnormal in tumors of 
the same type. The number of such regions in any given tumor 
type can be large, presumably the result of malfunction of 
components of damage surveillance systems, DNA repair and/ 
or mitotic apparatus that lead to chromosomal and genetic 
instability. As an example, over 30 regions of abnormal copy 
number or LOH have been identified in breast cancers (69,70). 
In addition, many of these abnormalities appear abnormal in 
multiple tumor types. Several well established oncogenes, 
tumor suppressor genes and genes associated with cancer 
susceptibility have been mapped to regions of recurrent abnor- 
mality. These include MYC (amplification at 8q24), AKT2 
(amplification at 19ql3), ERBB2 (amplification at 17q21.2), 
CCNDJ (amplification at 1 lql3), p53 (LOH at 17pl3), BRCAJ 
(LOH at 17q21) and BRCA2 (LOH at 13ql2). In addition, 
many well mapped regions of genetic susceptibility in mouse 
tumors localize to regions of LOH/genome copy number 
abnormality (71). These data strongly support the notion that 
regions of recurrent abnormality encode genes that contribute 
to cancer progression when differentially expressed because 
of mutation, loss or amplification. 

Several cancer-related genes have been identified based on 
their locations in regions of recurrent LOH including the 
multiple endocrine neoplasia type 1 gene {MEN- J) at llql3 
(72), the fragile histidine triad (FHIT) gene at 3pl4.2 (73), 
and a cell adhesion associated gene [CDHI at 16q22 (74)]. 
Genes discovered in regions of abnormality identified using 
CGH include the androgen receptor (AR) gene, located in a 
region of increased copy number at Xq 1 2 in hormone-refractory 
prostate cancers (75-77), and PIK3CA, located in a region of 
recurrent abnormality at 3q26 in ovarian cancer (78). In 
addition, several genes have been identified in regions of 
amplification at 20q including the steroid receptor co-activator, 
A/BI (79,80); a putative Zn-finger transcription factor, ZNF2I7 
(81), associated with instability and immortalization (82), and 
a centrosome-associated serine/threonine protein kinase STK15 
(83). The protein tyrosine phosphatase, PTEN was discovered 
using RDA and has been implicated as a tumor suppressor gene. 
Comprehensive reviews of regions of common cytogenetic and 
genomic abnormalities by Mitelman et ai (84) and Knuutila 
et al. (85,86), respectively, illustrate the vast number of 
regions that remain to be explored. Discovery and functional 
characterization of the genes in these regions and determination 
of the order, if any, in which they occur, should lead to a much 
improved understanding of carcinogenesis and progression. 

Progression 

Association of specific abnormalities with progression is 
important since this may lead to identification of early events 
that may guide the development of markers for early detection, 
shed light on the mechanisms associated with carcinogenesis, 
provide information about the mechanisms of progression and 
lead to development of improved prognostic or predictive 
markers. Several studies have investigated progression related 
events. This process is best worked out in colon cancer where 
lesions at various stages of progression are readily apparent 
(87). Unfortunately, such visual clues are not as apparent in 
many other tumor types, so events associated with progression 
are much less well established. Nonetheless, genome scanning 
techniques have revealed several general aspects of progression 
in other tumor systems. As expected, the overall number of 



abnormalities increases with indicators of progression such as 
grade and/or stage (88-90). Somewhat surprising, however, 
is the finding that genomic aberrations occur early during 
progression and are often well established prior to the onset 
of invasion (91). Finally, comparison of invasive cancers with 
distant metastases reveals remarkable genomic similarities 
between these two disease stages in the majority of cancers 
(90,92-94). In fact, related tumors are generally sufficiently 
similar that genome scanning techniques may well be able to 
distinguish de novo from recurrent tumors (95). Numerous 
regions of specific abnormalities also have been associated 
with progression. However, the details are sufficiently tumor- 
type-specific that their description is beyond the scope of 
this review. 

Although many recurrent abnormalities have been identified, 
the exact spectrum of aberrations often varies according to 
tumor histology, genetic or ethnic background. Invasive ductal 
and lobular breast carcinomas, for example, show distinct 
patterns of recurrent abnormalities with 16q loss more frequent, 
and 8q and 20q gains less frequent in lobular cancers than in 
ductal cancers (96). Likewise, in brain tumors, LOH involving 
D17S379 (17pl3.3) is associated with high-grade malignancies 
such as anaplastic astrocytoma and glioblastoma multiforme 
(97). Genetic background also influences the spectrum of 
accumulated abnormalities. The most extreme example 
involves patients with an inherited predisposition to microsa- 
tellite instability (98). Tumors from these patients show few 
chromosomal or LOH abnormalities while those from patients 
lacking this mutator phenotype show a much larger number 
of chromosomal abnormalities; consistent with the idea that 
these two classes of tumor evolve through different genetic 
mechanisms (99). Genetic-background-specific differences in 
aberration spectrum also have been observed between sporadic 
breast cancers and those arising in patients with BRCAl 
and BRCAl (100). Both the number and the spectrum of 
abnormalities differ between these groups of patients with 
tumors arising in patients carrying predisposing genetic lesions 
having more accumulated abnormalities. Given this influence 
of genetic background on the spectrum of abnormalities, it 
would not be surprising to find that the spectrum of abnormalit- 
ies differs according to ethnic background. CGH studies (101) 
suggest such differences. Specifically, gain of the 4q25-<q28 
region appears to be much more common in prostate tumors 
from African-American patients compared with those from 
Caucasian patients. Taken together, these data support the 
hypothesis that recurrent abnormalities encode genes that play 
important roles in cancer progression. They also indicate that 
tumors of different histology and genetic background may be 
significantly different at the genetic level and thus likely to 
differ in their biological characteristics and clinical behavior 
including response to therapy. 

Inter- tumor heterogeneity, prognostication and prediction of 
response to therapy 

Substantial genomic and expression differences also can exist 
between tumors that appear clinically similar. This is likely 
one reason why tumors that appear similar can progress and 
respond to therapy in dramatically different ways. The existence 
of tiiese differences and the availability of convenient technolo- 
gies for their detection has stimulated an international effort 
to identify molecular determinants of tumor behavior with the 
goal of improving the precision with which tumors can be 
classified according to outcome (102-104). Numerous gene 
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specific studies support the concept of molecular tumor classi- 
fication. Particularly strong associations with response to 
chemotherapy have been established between genes in the 
p53, signaling and apoptosis pathways (105-108). Large-scale 
genome and gene expression profiling studies suggest the 
existence of a much larger number of therapy-associated genes. 
Several reports have associated clinical outcome with genome 
copy number increases and/or decreases with clinical outcome 
(109-1 15), LOH (73,116-121), general level of genomic disar- 
ray [e.g. the total number of abnormalities as measured using 
genome scanning techniques such as CGH, LOH or RLGS 
(1,122,123)] and differential gene expression (124-126). These 
studies suggest the possibility of using individual tumor genetic 
profiles to predict responses to specific therapies if applied 
before the start of treatment or to detection of conditions of 
resistance if applied during the course of treatment. 

Differential gene expression 

Differential display, SAGE and subtraction techniques have 
been applied to solid tumors and tumor cell lines, although 
the total number of samples analyzed is far fewer than have 
been analyzed using genome analysis technologies. The smaller 
number is mostly due to the difficulty of obtaining tumor 
material prepared in a manner that preserves mRNA. Nonethe- 
less, these studies have revealed numerous differentially 
expressed genes, typically dozens to hundreds per analysis. 
This is not surprising, given the large number of genomic 
aberrations that exist in most solid tumors. SAGE has been 
applied to gene expression differences between normal human 
bronchiai/tracheal epithelial cell cultures and non-small cell 
lung cancers ( 1 27) and to assess and study differences in gene 
expression between cells with functional and non-functional 
p53 (128). Differential display has been more extensively 
applied. Recent studies include analyses gene expression 
changes associated with quiescence or in late Gj phase of the 
cell cycle in human breast cancer cells (129), to identify genes 
regulated by androgen in an androgen-responsive prostate 
cancer cell line (130), and to identify genes that are differenti- 
ally expressed in prostate cancers (131). Differential display 
also has been used to compare gene expression in normal 
and tumor-derived human mammary epithelial cells (132) to 
identify genes associated with progression in breast cancer 
(133). Expression microarrays have not yet been applied 
extensively to human tumors. However, analyses of cell lines 
and model organisms illustrate its potential. Recent applications 
include analysis of radiation stress response (134,135), analysis 
of response of human fibroblasts to serum (136), detection of 
differences in gene expression between normal and neoplastic 
human ovarian tissues (137) and analysis of gene expression 
in alveolar rhabdomyosarcoma cell lines (138). These studies 
show the potential of expression microarrays both for detection 
of differences in gene expression and in quantitative analysis 
of gene expression levels. However, they also reveal the 
dramatic changes in gene expression that can result from 
sample handling and cell culture as well as the difficulty of 
interpreting gene expression changes in hundreds or thousands 
of genes. Success in this venture will depend critically on the 
development of new information management and interpreta- 
tion techniques (139-141). 
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